This library has been created to enable users to view examples of data visualisations utilising the NHS themes, using NHS public data sets. Users can load the required packages and create the sample data sets, and then choose which data visualisations they would like to run in their own environment.

The foundations of this library are sourced from a GGplot guide by Mike Perham. This has been expanded as a proof of concept for working collaboratively over regions to input and build upon, becoming the R Data Viz Library.

For more information about each data visualisation type, the below are recommended to review:

The Data Visualisation Catalogue: https://datavizcatalogue.com/index.html

Install/load required packages

if (!require("pacman")) install.packages("pacman"); library(pacman)

pacman::p_load(Rcpp, tidyverse,dplyr,tidyr,
               ggplot2,ggthemes,ggtext,scales,
               png,ggalt,NHSRdatasets,onsr,shinycssloaders,plotly)

# install.packages('devtools')
#devtools::install_github('nhs-r-community/NHSRtheme')
# install.packages("remotes")
remotes::install_github("rOpenSci/fingertipsR",
                        build_vignettes = TRUE,
                        dependencies = "suggests",
                        build = F)

Load sample data

All of the examples in this document use A&E dummy data from the NHSRdatasets package for NHS reporting, fingertips data for Public Health and ONS data for population data. These give us broad datasets that can be used for different data visualisation types.

More information on these packages can be found here:

NHSRdatasets:

https://github.com/nhs-r-community/NHSRdatasets

https://nhs-r-community.github.io/NHSRdatasets/

#Load initial dataset and clean up
Attends <- NHSRdatasets::ae_attendances %>%
  filter(org_code == "RXQ"|org_code=="RTH"|org_code=="RHW"|
           org_code == "RTK"|org_code == "RA2") %>%
  filter(type ==1) %>%
  select(-c(3,5))

Fingertips:

https://github.com/ropensci/fingertipsR

The below gets population by sex.

# Load fingertipsR package
library(fingertipsR)

# Get available profiles in fingertips
profiles_data <- profiles()
print(profiles_data)

# Find profiles related to population
population_profiles <- profiles_data %>%
  filter(grepl("population", ProfileName, ignore.case = TRUE))
print(population_profiles)

# Search for indicators related to population structure, DomainID was found by viewing the population_profiles and selecting a DomainID
population_indicators <- indicator_metadata(DomainID = "1938133081") #Replace with DomainID from above

# Get the relevant indicator of the measure from your list of population_indicators
indicator_id <- 92708  # Replace with the actual indicator ID from the results above

#Get list of area types
area_types_data <- area_types()

# Get the data for the specific indicator
population_data <- fingertips_data(IndicatorID = indicator_id,AreaTypeID = "15") #Note for Area Types you can select all, but it will take a long time. 15 is England.

# Filter data to include only relevant columns and non-NA values
population_data_filtered <- population_data %>%
  filter(!is.na(Age), !is.na(Value)) %>%
  select(AreaName, Sex, Age, Value)

# Adjust the values for plotting (male values negative for pyramid structure)
population_data_filtered <- population_data_filtered %>%
  mutate(Value = ifelse(Sex == "Male", -Value, Value)) %>%
  filter(Age != "All ages") %>%
  filter(Sex != "Persons")

# Convert the Age column to a factor and specify the levels in the desired order
population_data_filtered$Age <- factor(population_data_filtered$Age, levels = c("0-4 yrs", "5-9 yrs", "10-14 yrs", "15-19 yrs", "20-24 yrs", "25-29 yrs", "30-34 yrs", "35-39 yrs", "40-44 yrs", "45-49 yrs", "50-54 yrs", "55-59 yrs", "60-64 yrs", "65-69 yrs", "70-74 yrs", "75-79 yrs", "80-84 yrs", "85-89 yrs","90+ yrs"))

ONS:

https://medium.com/@VickyCrockett1/how-do-you-get-data-into-r-from-the-ons-c860043fef8c

The next step loads the data and performs some simple filtering steps.

NHSR Theme

All of the examples in this document use dummy data from the NHSRdatasets package (more information on this package can be found here: https://github.com/nhs-r-community/NHSRtheme). As the package is not in CRAN, you need to use devtools to load the package from github.

## Skipping install of 'NHSRtheme' from a github remote, the SHA1 (48293555) has not changed since last install.
##   Use `force = TRUE` to force installation
##   DarkBlue       Blue BrightBlue  LightBlue   AquaBlue      Black   DarkGrey 
##  "#003087"  "#005EB8"  "#0072CE"  "#41B6E6"  "#00A9CE"  "#231f20"  "#425563" 
##    MidGrey   PaleGrey  DarkGreen      Green LightGreen  AquaGreen     Purple 
##  "#768692"  "#E8EDEE"  "#006747"  "#009639"  "#78BE20"  "#00A499"  "#330072" 
##   DarkPink       Pink    DarkRed        Red     Orange WarmYellow     Yellow 
##  "#7C2855"  "#AE2573"  "#8A1538"  "#DA291C"  "#ED8B00"  "#FFB81C"  "#FAE100"

Data Over Time

Basic line chart

ggplot

#Filter initial dataset
line_df <- Attends %>%
  filter(org_code=="RXQ")

#Make plot
ggplot(line_df, aes(x = period, y = attendances)) +
  geom_line(colour = "#005EB8", size = 1.5) +
  scale_y_continuous(labels = comma) +
  labs(title="Type 1 attendances - Bucks Healthcare",
       subtitle = "April 2016 to March 2019",
       y = "Attendances",
       x = "Month") +
  expand_limits(y = 0)

Plotly

NEY TO ADD IN

Multiple line chart

ggplot

#Filter initial dataset
multiple_line_df <- Attends %>%
  filter(org_code == "RXQ" | org_code=="RTH") 

#Make plot
ggplot(multiple_line_df,
         aes(x = period, y = attendances, colour = org_code)) +
  geom_line(size = 1) +
  geom_point() +
  scale_colour_manual(values = c("#005EB8", "#41B6E6")) +
  scale_y_continuous(labels = comma) +
  labs(
    title = "Type 1 attendances - Bucks Healthcare vs Royal Berkshire",
    subtitle = "April 2016 to March 2019",
    y = "Attendances",
    x = "Month"
  ) +
  expand_limits(y = 0) +
  theme(legend.title = element_blank())

Plotly

NEY TO ADD IN

SPC Charts

Comparisons

Simple bar chart

ggplot

#Filter initial dataset
bar_df <- Attends %>%
  filter(period == "2019-03-01")

#Make plot
bar <- ggplot(bar_df, aes(x = org_code, y = attendances)) +
  geom_bar(stat = "identity",
           position = "identity",
           fill = "#005EB8") +
  geom_hline(yintercept = 0,
             size = 1,
             colour = "#333333") +
  scale_y_continuous(labels = comma) +
  labs(
    title = "Type 1 attendances",
    subtitle = "March 2019",
    y = "Attendances",
    x = "Provider Code"
  )

plot(bar)

Add labels

The code below adds labels to your simple bar chart.

#Filter initial dataset
bar + geom_text(aes(label = scales::comma(attendances)), vjust =2, color= "White")

Plotly

NEY TO ADD IN

Grouped bar chart

ggplot

#Filter initial dataset
grouped_bar_df <- Attends %>%
  filter(period == "2017-03-01" | period == "2019-03-01") %>%
  select(c(1:3))

#Make plot
ggplot(grouped_bar_df,
       aes(
         x = org_code,
         y = attendances,
         fill = as.factor(period)
       )) +
  geom_bar(stat = "identity", position = "dodge") +
  geom_hline(yintercept = 0,
             size = 1,
             colour = "#333333") +
  scale_y_continuous(labels = comma) +
  #NHSRtheme::scale_fill_nhs('blues')+
  labs(
    title = "Attendances have increased in all providers other than Bucks Healthcare",
    subtitle = "March 2017 vs March 2019",
    y = "Attendances",
    x = "Provider Code"
  ) +
  theme(legend.title = element_blank())

Plotly

NEY TO ADD IN

Stacked bar chart

ggplot

AttendsAll <- NHSRdatasets::ae_attendances %>%
  filter(
    org_code == "RXQ" | org_code == "RTH" | org_code == "RHW" |
      org_code == "RTK" | org_code == "RA2"
  ) %>%
  filter(period == '2017-03-01')

ggplot(AttendsAll, aes(fill = type, y = attendances, x = org_code)) +
  geom_bar(position = "stack", stat = "identity") +
  scale_y_continuous(labels = comma) +
  labs(title = "A&E attendances by department type - March 2017",
       y = "Attendances",
       x = "Provider Code") +
  theme(legend.title = element_blank())

Plotly

NEY TO ADD IN

Percent stacked bar chart

ggplot

AttendsAll <- NHSRdatasets::ae_attendances %>%
  filter(
    org_code == "RXQ" | org_code == "RTH" | org_code == "RHW" |
      org_code == "RTK" | org_code == "RA2"
  ) %>%
  filter(period == '2017-03-01')

ggplot(AttendsAll, aes(fill = type, y = attendances, x = org_code)) +
  geom_bar(position = "fill", stat = "identity") +
  scale_y_continuous(labels = percent) +
  labs(title = "A&E attendances by department type - March 2017",
       y = "Attendances",
       x = "Provider Code") +
  theme(legend.title = element_blank())

Plotly

NEY TO ADD IN

Bubble chart

ggplot

#For this example we are filtering on 5 organisations, type 1 activity & excluding column 3 from the dataframe.
Attends <- NHSRdatasets::ae_attendances %>%
  filter(org_code == "RXQ"|org_code=="RTH"|org_code=="RHW"|
           org_code == "RTK"|org_code == "RA2")

  # Summarise data for type 1 attendances
  type_1_summary <- Attends %>%
  filter(type == 1) %>%
  group_by(org_code, period) %>%
  summarise(type_1_attendances = sum(attendances, na.rm = TRUE)) %>%
  ungroup()
## `summarise()` has grouped output by 'org_code'. You can override using the
## `.groups` argument.
# Summarise data for non-type 1 attendances
type_other_summary <- Attends %>%
  filter(type == 3|type== 2) %>%
  group_by(org_code, period) %>%
  summarise(type_other_attendances = sum(attendances, na.rm = TRUE)) %>%
  ungroup()
## `summarise()` has grouped output by 'org_code'. You can override using the
## `.groups` argument.
# Summarise total attendances and total admissions
total_summary <- Attends %>%
  group_by(org_code, period) %>%
  summarise(
    total_attendances = sum(attendances, na.rm = TRUE),
    total_breaches = sum(breaches, na.rm = TRUE),
    total_admissions = sum(admissions, na.rm = TRUE)
  ) %>%
  ungroup()
## `summarise()` has grouped output by 'org_code'. You can override using the
## `.groups` argument.
# Merge the summaries into a single data frame
final_summary <- total_summary %>%
  left_join(type_1_summary, by = c("org_code", "period")) %>%
  left_join(type_other_summary, by = c("org_code", "period"))

# Replace NA values with 0 for type_1_attendances and type_3_attendances
final_summary <- final_summary %>%
  mutate(
    type_1_attendances = replace_na(type_1_attendances, 0),
    type_other_attendances = replace_na(type_other_attendances, 0)
  )

# Add percentage columns
final_summary <- final_summary %>%
  mutate(
    perc_admissions_attendances = (total_admissions / total_attendances) * 100,
    perc_type1_attendances_total = (type_1_attendances / total_attendances) * 100,
    perc_breaches_attendances = (total_breaches / total_attendances) * 100
  )

# Filter initial dataset
bubble_df <- final_summary

# Calculate size for bubble chart (proportional to type 1 attendances)
bubble_df <- bubble_df %>%
  mutate(size = perc_type1_attendances_total / max(perc_type1_attendances_total) * 100)

# Create bubble chart
ggplot(bubble_df, aes(x = perc_admissions_attendances, y = perc_breaches_attendances, size = size, color = size)) +
  geom_point(alpha = 0.5) +
  scale_size_continuous(name = "Proportion of type 1") +
  #NHSRtheme::scale_fill_nhs('blues', name = "Proportion of type 1") +
  labs(title = "Bubble Chart of % 4 Hour Breaches vs % converted to admission with % Attendances Type 1 Size",
       x = "Conversion Rate", y = "% 4 Hour Breaches") +
  theme(legend.position = "right")

Plotly

#For this example we are filtering on 5 organisations, type 1 activity & excluding column 3 from the dataframe.
Attends <- NHSRdatasets::ae_attendances %>%
  filter(org_code == "RXQ"|org_code=="RTH"|org_code=="RHW"|
           org_code == "RTK"|org_code == "RA2")

# Summarise data for type 1 attendances
  type_1_summary <- Attends %>%
  filter(type == 1) %>%
  group_by(org_code, period) %>%
  summarise(type_1_attendances = sum(attendances, na.rm = TRUE)) %>%
  ungroup()
## `summarise()` has grouped output by 'org_code'. You can override using the
## `.groups` argument.
# Summarise data for non-type 1 attendances
type_other_summary <- Attends %>%
  filter(type == 3|type== 2) %>%
  group_by(org_code, period) %>%
  summarise(type_other_attendances = sum(attendances, na.rm = TRUE)) %>%
  ungroup()
## `summarise()` has grouped output by 'org_code'. You can override using the
## `.groups` argument.
# Summarise total attendances and total admissions
total_summary <- Attends %>%
  group_by(org_code, period) %>%
  summarise(
    total_attendances = sum(attendances, na.rm = TRUE),
    total_breaches = sum(breaches, na.rm = TRUE),
    total_admissions = sum(admissions, na.rm = TRUE)
  ) %>%
  ungroup()
## `summarise()` has grouped output by 'org_code'. You can override using the
## `.groups` argument.
# Merge the summaries into a single data frame
final_summary <- total_summary %>%
  left_join(type_1_summary, by = c("org_code", "period")) %>%
  left_join(type_other_summary, by = c("org_code", "period"))

# Replace NA values with 0 for type_1_attendances and type_3_attendances
final_summary <- final_summary %>%
  mutate(
    type_1_attendances = replace_na(type_1_attendances, 0),
    type_other_attendances = replace_na(type_other_attendances, 0)
  )

# Add percentage columns
final_summary <- final_summary %>%
  mutate(
    perc_admissions_attendances = (total_admissions / total_attendances) * 100,
    perc_type1_attendances_total = (type_1_attendances / total_attendances) * 100,
    perc_breaches_attendances = (total_breaches / total_attendances) * 100
  )

# Filter initial dataset
bubble_df <- final_summary

# Calculate size for bubble chart (proportional to type 1 attendances)
bubble_df <- bubble_df %>%
  mutate(size = perc_type1_attendances_total / max(perc_type1_attendances_total) * 100)


# Create bubble chart
plot_ly(bubble_df, x = ~perc_admissions_attendances, y = ~perc_breaches_attendances, text = "Proportion of type 1", type = 'scatter', mode = 'markers', color = ~size, colors = 'Blues', size = ~size, sizes = c(5,20),
        marker = list(sizemode = 'diameter', opacity = 0.7)) %>% 
  layout(title = 'Bubble Chart of % 4 Hour Breaches vs % converted to admission with % Attendances Type 1 Size',
         xaxis = list(showgrid = FALSE, title = "Conversion Rate"),
         yaxis = list(showgrid = FALSE, title = "% 4 Hour Breaches"))

Population Pyramid chart

ggplot

# Plot the population pyramid
ggplot(population_data_filtered, aes(x = Age, y = Value, fill = Sex)) +
  geom_bar(stat = "identity", position = "identity") +
  coord_flip() +
  scale_y_continuous(labels = function(x) comma(abs(x))) +
  labs(title = "Population Age Profile by Gender",
       x = "Age Group",
       y = "Population Count",
       fill = "Gender") +
  NHSRtheme::scale_fill_nhs("blues")

Plotly

NEY TO ADD IN

Radar chart

Insert here

Radial chart

Insert here

Radial Column chart

ggplot

# Create a radial column chart
ggplot(Attends, aes(x = reorder(org_code, -attendances), y = attendances)) +
  geom_col(width = 0.5, fill = "skyblue") +
  coord_polar(start = 0) +
   #NHSRtheme::scale_fill_nhs("blues") +
  scale_y_continuous(labels = function(x) comma(abs(x))) +
  labs(title = "Attendances per Trust",
       x = NULL, y = NULL)

Plotly

NEY TO ADD IN

Span chart

ggplot

# Summarize the data
summary_data <- Attends %>%
  group_by(org_code) %>%
  summarise(min_attendance = min(attendances),
            max_attendance = max(attendances))

# Create the horizontal bar range chart
ggplot(summary_data, aes(y = org_code)) +
  geom_linerange(aes(xmin = min_attendance, xmax = max_attendance), color = "blue", size = 1.5) +
  labs(title = "Range of Type 1 Attendances by Trust between 2019 and 2023",
       x = "Number of Attendances",
       y = "Organisation Code") +
   NHSRtheme::scale_fill_nhs("blues")

Plotly

NEY TO ADD IN

Stacked area chart

ggplot

# Plot the stacked area chart
ggplot(Attends, aes(x = period, fill = org_code)) +
  geom_area(stat = "count") +
  NHSRtheme::scale_fill_nhs("blues")

Plotly

NEY TO ADD IN

Creating multiple charts for the same measure

Faceted chart

Use facet_wrap() to create multiple charts split by subgroup in data. You can use ncol = or nrow to specify number of rows or columns. For example, facet_wrap(~org_code, nrow=1) to put all charts in a single row.

ggplot(Attends, aes(x = period, y = attendances)) +
  geom_line(colour = "#005EB8", size = 1) +
  facet_wrap(~org_code)+
  labs(title="Type 1 attendances",
       subtitle = "April 2016 to March 2019") +
  expand_limits(y = 0)

Range

Box & Whisker chart

g <- ggplot(Attends, aes(org_code, attendances))
g + geom_boxplot(varwidth=T, fill="light blue") + 
    labs(title="A&E Attendances", 
         subtitle="Distribution by Trust",
         caption="Source: A&E Monthly Stats",
         x="Trust",
         y="A&E attendances")

Bullet chart

Insert here

Candlestick chart

Insert here

Error bars chart

Insert here

Funnel Plot

NEY TO ADD IN

Gantt chart

Insert here

Kagi chart

Insert here

Span chart

Insert here

Violin chart

#Violin Plot
ggplot(Attends, aes(org_code, attendances)) + geom_violin() + 
  labs(title="A&E Attendances", subtitle="Range by Trust", caption="Source: A&E Monthly Statistics", x="Trust", y="Attendances") + scale_fill_brewer(palette="Blues") + theme_classic()

Dumbell chart

ggplot

The ggtext package can be used to add colour to titles or subtitles. You need to ensure that you use it with theme(plot.subtitle = element_markdown(hjust = 0, size = 12)) otherwise it will not work.

#Prepare data
dumbbell_df <- NHSRdatasets::ae_attendances %>%
  filter(type ==1) %>%
  select(-c(3,6)) %>%
  filter(period == "2017-03-01" | period =="2019-03-01") %>%
  mutate(period =as.numeric(format(period,'%Y'))) %>%
  mutate(period = as.character(period)) %>%
    mutate(performance = 1- (breaches/attendances)) %>%
  select(c(1:2,5)) %>%
  spread(period, performance) %>%
  mutate(gap = `2019` - `2017`) %>%
  arrange(desc(gap)) %>%
  head(10)

#Make plot
dumbell <- ggplot(dumbbell_df, aes(x = `2017`, xend = `2019`, y = reorder(org_code, gap), group = org_code)) + 
  geom_dumbbell(colour = "#dddddd",
                size = 3,
                colour_x = "#41B6E6",
                colour_xend = "#005EB8") +
  scale_x_continuous(labels = scales::percent_format(accuracy=1))+
  geom_vline(xintercept = 0.95, size = 1, colour="#333333", linetype = "dashed") +
  labs(title = "Performance improved for all providers",
    subtitle = "<span style='color: #41B6E6;'>March 2017 <span><span style='color: black;'> vs <span><span style='color: #005EB8;'>March 2019<span>") +
  xlab("4 hour performance") +
  ylab("Org code") +
 theme(plot.subtitle = element_markdown(hjust = 0, size = 12))+ theme(legend.position = "none")

plot(dumbell)

Adding annotations

You can use geom_label to add annotations to existing plots or you can add line in when creating ggplot.

dumbell + geom_label(aes(x = 0.9, y = "R1K",label = "Standard"), 
                           hjust = -0.5, 
                           vjust = -0.1, 
                           colour = "#555555",
                           label.size = NA, 
                           family="Arial", 
                           size = 4)

Plotly

NEY TO ADD IN

Movement or Flow

Sankey Diagram

NEY TO ADD IN